{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Q99VyIWMg1We"
      },
      "source": [
        "##### Copyright 2020 Google LLC."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "cellView": "form",
        "colab": {},
        "colab_type": "code",
        "id": "_G9cIHu8h_Bp"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "_4hpjDa5f2l6"
      },
      "source": [
        "# Neural Voxel Renderer\n",
        "\n",
        "\n",
        "![](https://storage.googleapis.com/tensorflow-graphics/notebooks/neural_voxel_renderer/teaser2-01.png)\n",
        "\n",
        "This notebook illustrates how to run [Neural Voxel Renderer](https://arxiv.org/abs/1912.04591) (CVPR2020). The input scene consists of a voxelized object, the ground, and a point light source and the network produces an image that correspond to the setting of the scene. In this version of the code, all operations are differentiable. \n",
        "\n",
        "\n",
        "\n",
        "![](https://storage.googleapis.com/tensorflow-graphics/notebooks/neural_voxel_renderer/1541-teaser.gif)\n",
        "\n",
        "\n",
        "\n",
        "\n",
        "\n",
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/graphics/blob/master/tensorflow_graphics/projects/neural_voxel_renderer/demo.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/graphics/blob/master/tensorflow_graphics/projects/neural_voxel_renderer/demo.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "V1TrE87Ogclv"
      },
      "source": [
        "## Setup and imports\n",
        "\n",
        "If Tensorflow Graphics is not installed on your system, the following cell can install the Tensorflow Graphics package for you.\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "WyVsJlt6lAnC"
      },
      "outputs": [],
      "source": [
        "!pip install tensorflow_graphics"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "10-O_KTXgL6T"
      },
      "outputs": [],
      "source": [
        "import pickle\n",
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "import tensorflow as tf\n",
        "\n",
        "from tensorflow_graphics.projects.neural_voxel_renderer import helpers\n",
        "from tensorflow_graphics.projects.neural_voxel_renderer import models\n",
        "from tensorflow_graphics.rendering.voxels import visual_hull"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "UHhaSuk7qq4J"
      },
      "source": [
        "## Download and extract example data \n",
        "\n",
        "The input object is a chair from the ShapeNet database, under MIT license."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "bNoBbo63gqft"
      },
      "outputs": [],
      "source": [
        "!rm -f /tmp/example_data.p*\n",
        "!wget -P /tmp/ https://storage.googleapis.com/tensorflow-graphics/notebooks/neural_voxel_renderer/example_data.p\n",
        "\n",
        "with open('/tmp/example_data.p', 'rb') as f:\n",
        "  example_data = pickle.load(f)\n",
        "\n",
        "object_voxels = example_data['object_voxels']\n",
        "camera_rotation_matrix = example_data['camera_rotation_matrix']\n",
        "camera_translation_vector = example_data['camera_translation_vector']\n",
        "focal = example_data['focal']\n",
        "principal_point = example_data['principal_point']\n",
        "light_position = example_data['light_position']\n",
        "object_rotation = example_data['object_rotation']\n",
        "object_translation = example_data['object_translation']\n",
        "object_elevation = example_data['object_elevation']"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "2pPlbUDVsbDp"
      },
      "source": [
        "## Voxel Placement\n",
        "\n",
        "The first input of the network is the scene voxels (object + ground). The object voxels extracted above cover a smaller area in the 3D world, thus they need to be placed in the volume that the network \"sees\" as a scene (and adjust also for the camera elevation).\n",
        "\n",
        "\n",
        " \u003cimg src=https://storage.googleapis.com/tensorflow-graphics/notebooks/neural_voxel_renderer/world_voxel_coords.png width=\"400\"\u003e\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "K8vQsy3Fj2ab"
      },
      "outputs": [],
      "source": [
        "VOXEL_SIZE = 128\n",
        "IMAGE_SIZE = 256\n",
        "\n",
        "BLENDER_SCALE = 2\n",
        "DIAMETER = 4.2  # The voxel area in world coordinates\n",
        "GROUND_COLOR = np.array((136., 162, 199))/255.\n",
        "\n",
        "object_rotation_v = object_rotation\n",
        "object_translation_v = object_translation[:, 0, [1, 0, 2]]*BLENDER_SCALE\n",
        "object_elevation_v = object_elevation\n",
        "\n",
        "ground_occupancy = np.zeros((VOXEL_SIZE, VOXEL_SIZE, VOXEL_SIZE, 1),\n",
        "                            dtype=np.float32)\n",
        "ground_occupancy[-2, 1:-2, 1:-2, 0] = 1\n",
        "ground_voxel_color = np.ones((VOXEL_SIZE, VOXEL_SIZE, VOXEL_SIZE, 3), \n",
        "                             dtype=np.float32)*\\\n",
        "                     np.array(GROUND_COLOR, dtype=np.float32)\n",
        "ground_voxel_color = np.concatenate([ground_voxel_color, ground_occupancy],\n",
        "                                    axis=-1)\n",
        "\n",
        "scene_voxels = object_voxels*(1-ground_occupancy) + \\\n",
        "                ground_voxel_color*ground_occupancy\n",
        "\n",
        "euler_angles_x = np.deg2rad(180-object_rotation_v)*np.array([1, 0, 0],\n",
        "                                                            dtype=np.float32)\n",
        "euler_angles_y = np.deg2rad(90-object_elevation_v)*np.array([0, 1, 0],\n",
        "                                                            dtype=np.float32)\n",
        "translation_vector = (object_translation_v/(DIAMETER*0.5))\n",
        "\n",
        "interpolated_voxels = helpers.object_to_world(scene_voxels,\n",
        "                                              euler_angles_x,\n",
        "                                              euler_angles_y,\n",
        "                                              translation_vector)\n",
        "\n",
        "color_input, alpha_input = tf.split(interpolated_voxels, [3, 1], axis=-1)\n",
        "voxel_img = visual_hull.render(color_input*alpha_input)\n",
        "\n",
        "_, ax = plt.subplots(1, 1, figsize=(5, 5))\n",
        "ax.imshow(voxel_img[0])\n",
        "ax.axis('off')\n",
        "ax.set_title('Voxel Visualization')\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "_KjnHkG2qgOv"
      },
      "source": [
        "## Synthesizing an image using the voxels\n",
        "\n",
        "The second input of the NVR+ network is an image directly rendered from the voxels. In the original paper, this was performed by splatting the center of the voxels in the image plane. In this notebook, the image is synthesized using a differentiable volumetric renderer.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "F5Qyk_PUrGfj"
      },
      "source": [
        "### Estimation of the ground image.\n",
        "\n",
        "The ground corresponds to a plane in 3D with know vertex locations."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "RHGao80FhVdA"
      },
      "outputs": [],
      "source": [
        "ground_image, ground_alpha = \\\n",
        "  helpers.generate_ground_image(IMAGE_SIZE, IMAGE_SIZE, focal, principal_point,\n",
        "                                camera_rotation_matrix,\n",
        "                                camera_translation_vector[:, :, 0],\n",
        "                                GROUND_COLOR)\n",
        "\n",
        "_, ax = plt.subplots(1, 1, figsize=(6, 6))\n",
        "ax.imshow((ground_image*ground_alpha)[0])\n",
        "ax.axis('off')\n",
        "ax.set_title('Ground image')\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "hK70y5EZro4x"
      },
      "source": [
        "### Differentiable Volume Rendering\n",
        "\n",
        "Given the colored voxels and the camera parameters we can estimate a rendered image of the input voxels using volumetric rendering techniques (see [Henzler et al](https://geometry.cs.ucl.ac.uk/projects/2019/platonicgan/) ICCV 2019). Briefly, for each pixel we cast a ray towards the volumetric scene and we estimate its final color based on the occupancy and color of the intersected voxels. \n",
        "\n",
        "Note that this image contains the original colors of the voxels (with projective texturing artifacts due to the different orientation of the object) and does not include shadows or other illumination effects."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "ibYSw2Fxjjes"
      },
      "outputs": [],
      "source": [
        "object_rotation_dvr = np.array(np.deg2rad(object_rotation),\n",
        "                                dtype=np.float32)\n",
        "object_translation_dvr = np.array(object_translation[..., [0, 2, 1]], \n",
        "                                  dtype=np.float32)\n",
        "object_translation_dvr -= np.array([0, 0, helpers.OBJECT_BOTTOM],\n",
        "                                    dtype=np.float32)\n",
        "\n",
        "rerendering = \\\n",
        "  helpers.render_voxels_from_blender_camera(object_voxels,\n",
        "                                    object_rotation_dvr,\n",
        "                                    object_translation_dvr,\n",
        "                                    256, \n",
        "                                    256,\n",
        "                                    focal,\n",
        "                                    principal_point,\n",
        "                                    camera_rotation_matrix,\n",
        "                                    camera_translation_vector,\n",
        "                                    absorption_factor=1.0,\n",
        "                                    cell_size=1.1,\n",
        "                                    depth_min=3.0,\n",
        "                                    depth_max=5.0,\n",
        "                                    frustum_size=(128, 128, 128))\n",
        "rerendering_image, rerendering_alpha = tf.split(rerendering, [3, 1], axis=-1)\n",
        "\n",
        "rerendering_image = tf.image.resize(rerendering_image, (256, 256))\n",
        "rerendering_alpha = tf.image.resize(rerendering_alpha, (256, 256))\n",
        "\n",
        "BACKGROUND_COLOR = 0.784\n",
        "final_composite = BACKGROUND_COLOR*(1-rerendering_alpha)*(1-ground_alpha) + \\\n",
        "                  ground_image*(1-rerendering_alpha)*ground_alpha + \\\n",
        "                  rerendering_image*rerendering_alpha\n",
        "\n",
        "_, ax = plt.subplots(1, 2, figsize=(10, 10))\n",
        "ax[0].imshow(rerendering_image[0])\n",
        "ax[0].axis('off')\n",
        "ax[0].set_title('Object rerendering')\n",
        "ax[1].imshow(final_composite[0])\n",
        "ax[1].axis('off')\n",
        "ax[1].set_title('Final composition')\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "cobjHJ1Otak1"
      },
      "source": [
        "## Running the Network\n",
        " \n",
        " \u003cimg src=https://storage.googleapis.com/tensorflow-graphics/notebooks/neural_voxel_renderer/rerender_architecture.png width=\"400\"\u003e\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "1Xvw6RXvhcmc"
      },
      "outputs": [],
      "source": [
        "# Downloading the data\n",
        "!rm -rf /tmp/checkpoint\n",
        "!mkdir /tmp/checkpoint\n",
        "!wget -P /tmp/checkpoint https://storage.googleapis.com/tensorflow-graphics/notebooks/neural_voxel_renderer/model.ckpt-126650.data-00000-of-00001\n",
        "!wget -P /tmp/checkpoint https://storage.googleapis.com/tensorflow-graphics/notebooks/neural_voxel_renderer/model.ckpt-126650.index\n",
        "!wget -P /tmp/checkpoint https://storage.googleapis.com/tensorflow-graphics/notebooks/neural_voxel_renderer/model.ckpt-126650.meta"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "OKtqVfi4ftFZ"
      },
      "outputs": [],
      "source": [
        "latest_checkpoint = '/tmp/checkpoint/model.ckpt-126650'\n",
        "\n",
        "tf.compat.v1.reset_default_graph()\n",
        "g = tf.compat.v1.Graph()\n",
        "with g.as_default():\n",
        "  vol_placeholder = tf.compat.v1.placeholder(tf.float32,\n",
        "                                    shape=[None, VOXEL_SIZE, VOXEL_SIZE, VOXEL_SIZE, 4],\n",
        "                                    name='input_voxels')\n",
        "  rerender_placeholder = tf.compat.v1.placeholder(tf.float32,\n",
        "                                        shape=[None, IMAGE_SIZE, IMAGE_SIZE, 3],\n",
        "                                        name='rerender')\n",
        "  light_placeholder = tf.compat.v1.placeholder(tf.float32,\n",
        "                                      shape=[None, 3],\n",
        "                                      name='input_light')\n",
        "  model = models.neural_voxel_renderer_plus(vol_placeholder,\n",
        "                                            rerender_placeholder,\n",
        "                                            light_placeholder)\n",
        "  predicted_image_logits, = model.outputs\n",
        "  saver = tf.compat.v1.train.Saver()\n",
        "\n",
        "a = interpolated_voxels.numpy()\n",
        "b = final_composite.numpy()*2.-1\n",
        "c = light_position\n",
        "with tf.compat.v1.Session(graph=g) as sess:\n",
        "  saver.restore(sess, latest_checkpoint)\n",
        "  feed_dict = {vol_placeholder: a,\n",
        "               rerender_placeholder: b,\n",
        "               light_placeholder: c}\n",
        "  predictions = sess.run(predicted_image_logits, feed_dict)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "cellView": "form",
        "colab": {},
        "colab_type": "code",
        "id": "eLug7iHHPmD4"
      },
      "outputs": [],
      "source": [
        "#@title NVR+ Output { vertical-output: true, run: \"auto\" }\n",
        "\n",
        "view = 8 #@param {type:\"slider\", min:0, max:9, step:1}\n",
        "\n",
        "_, ax = plt.subplots(1, 1, figsize=(5, 5))\n",
        "ax.imshow(predictions[view]*0.5+0.5)\n",
        "ax.axis('off')\n",
        "ax.set_title('NVR+ prediction')\n",
        "plt.show()"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "last_runtime": {
        "build_target": "",
        "kind": "local"
      },
      "name": "Running neural voxel renderer.ipynb",
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
